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The "mind-brain supervenience" conjecture suggests that all mental properties are derived from the 
physical properties of the brain. To address the question of whether the mind supervenes on the brain, we 
frame a supervenience hypothesis in rigorous statistical terms. Specifically, we propose a modified version of 
supervenience (called ^-supervenience) that is amenable to experimental investigation and statistical 
analysis. To illustrate this approach, we perform a thought experiment that illustrates how the probabilistic 
theory of pattern recognition can be used to make a one-sided determination of ^-supervenience. The 
physical property of the brain employed in this analysis is the graph describing brain connectivity (i.e., the 
brain-graph or connectome). ^-supervenience allows us to determine whether a particular mental property 
can be inferred from one's connectome to within any given positive misclassification rate, regardless of the 
relationship between the two. This may provide further motivation for cross-disciplinary research between 
neuroscientists and statisticians. 



Q 



uestions and assumptions about mind-brain supervenience go back at least as far as Plato's dialogues 
circa 400 BCE 1 . While there are many different notions of supervenience, we find Davidson's canonical 
description particularly illustrative 2 : 

[mind-brain] supervenience might be taken to mean that there cannot be two events alike in all physical 
respects but differing in some mental respect, or that an object cannot alter in some mental respect without 
altering in some physical respect. 

Colloquially, supervenience means "there cannot be a mind- difference without a physical-difference." This 
philosophical conjecture has potentially widespread implications. For example, neural network theory and 
artificial intelligence often implicitly assume a local version mind-brain supervenience 3, 4 . Cognitive neuroscience 
similarly seems to operate under such assumptions 5 . Philosophers continue to debate and refine notions of 
supervenience 6 . Yet, to date, relatively scant attention has been paid to what might be empirically learned about 
supervenience. 

In this work we attempt to bridge the gap between philosophical conjecture and empirical investigations by 
casting supervenience in a probabilistic framework amenable to hypothesis testing. We then use the probabilistic 
theory of pattern recognition to determine the limits of what one can and cannot learn about supervenience 
through data analysis. The implications of this work are varied. It provides a probabilistic framework for con- 
verting philosophical conjectures into statistical hypotheses that are amenable to experimental investigation, 
which allows the philosopher to gain empirical support for her rational arguments. This leads to the construction 
of the first explicit proof (to our knowledge) of a universally consistent classifier on graphs, and the first 
demonstration of the tractability of answering supervenience questions. Supervenience therefore seems to per- 
haps be a useful but under-utilized concept for neuroscientific investigations. This work should provide further 
motivation for cross-disciplinary efforts across three fields — philosophy, statistics, and neuroscience — with 
shared goals but mostly disjoint jargon and methods of analysis. 

Results 

Statistical supervenience: a definition. Let M = {mi,m 2 , . . .} be the space of all possible minds and let 
B = {b\,b2, • . •} be the set of all possible brains. A4 includes a mind for each possible collection of thoughts, 
memories, beliefs, etc. B includes a brain for each possible position and momentum of all subatomic particles 
within the skull. Given these definitions, Davidson's conjecture may be concisely and formally stated thusly: 
m m' => b ^ V , where (m,b),(m' ,b')eA4 x B are mind-brain pairs. This mind-brain supervenience relation 
does not imply an injective relation, a causal relation, or an identity relation (see Appendix 1 for more details and 
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some examples). To facilitate both statistical analysis and empirical 
investigation, we convert this local supervenience relation from a 
logical to a probabilistic relation. 

Let Fmb indicate a joint distribution of minds and brains. 
Statistical supervenience can then be defined as follows: 

Definition 1. M is said to statistically supervene on B for distribu- 
tion F = F MB , denoted M~ F B, if and only ifP[mj^m'\b = b']=0, 
or equivalently P[m = m'\b = b'] = 1. 

Statistical supervenience is therefore a probabilistic relation on sets 
which could be considered a generalization of correlation (see 
Appendix 1 for details). 

Statistical supervenience is equivalent to perfect classification 
accuracy. If minds statistically supervene on brains, then if two 
minds differ, there must be some brain-based difference to account 
for the mental difference. This means that there must exist a 
deterministic function g* mapping each brain to its supervening 
mind. One could therefore, in principle, know this function. When 
the space of all possible minds is finite — that is, |A4|<oo — any 
function g: B^>M mapping from minds to brains is called a 
classifier. Define misclassification rate, the probability that g 
misclassifies b under distribution F = F MB , as 

L F (g) = P\g(B)¥=M] i{g(b)¥=m}P[B=bM=m], (1) 

(m,b)eM x B 

where D{-} denotes the indicator function taking value unity 
whenever its argument is true and zero otherwise. The Bayes 
optimal classifier g* minimizes L P (g) over all classifiers: g* = 
argmin^ L F (g). The Bayes error, or Bayes risk, L* = Lp(g*), is the 
minimum possible misclassification rate. 

The primary result of casting supervenience in a statistical frame- 
work is the below theorem, which follows immediately from 
Definition 1 and Eq. (1): 

Theorem 1. M~ F BoL* = 0. 

The above argument shows (for the first time to our knowledge) 
that statistical supervenience and zero Bayes error are equivalent. 
Statistical supervenience can therefore be thought of as a constraint 
on the possible distributions on minds and brains. Specifically, let T 
indicate the set of all possible joint distributions on minds and brains, 
and let T s = {Fmb^J~ ■ L* — 0} be the subset of distributions 
for which supervenience holds. Theorem 1 implies that J- S ^J-. 
Mind-brain supervenience is therefore an extremely restrictive 
assumption about the possible relationships between minds and 
brains. It seems that such a restrictive assumption begs for empirical 
evaluation, vis-a-vis, for instance, a hypothesis test. 

The non-existence of a viable statistical test for supervenience. The 

above theorem implies that if we desire to know whether minds 
supervene on brains, we can check whether L* = 0. Unfortunately, 
L* is typically unknown. Fortunately, we can approximate L* using 
training data. 

Assume that training data T„ = { (Mi ,Bi), . . . , (M„ ,B n ) } are each 
sampled identically and independently (iid) from the true (but 
unknown) joint distribution F = Fmb- Let g n be a classifier induced 
by the training data, g„: B x (M x B) I— >M. The misclassification 
rate of such a classifier is given by 

Lp(gn)= Kgn(b;T n )^m}P[B = b,M = m], (2) 

(m.b)eM x B 

which is a random variable due to the dependence on a randomly 
sampled training set T„ . Calculating the expected misclassification 
rate E[Lp(g„)] is often intractable in practice because it requires 
a sum over all possible training sets. Instead, expected 
misclassification rate can be approximated by "hold-out" error. Let 



H„' = {(M„ + i,B n+ i), . . . ,(M„ +n i,B n+n i)} be a set of n' hold-out 
samples, each sampled iid from Fmb- The hold-out approximation 
to the misclassification rate is given by 

J2 Hgn(BhTn) ^Mi}~E[L F (g n )] >L*. (3) 

(M,,B,)gH„, 

By definition of g*, the expectation of L F (g n ) (with respect to both 
T„ and 7i n i) is greater than or equal to L* for any g n and all n. Thus, 
we can construct a hypothesis test for L* using the surrogate L F (g„). 

A statistical test proceeds by specifying the allowable Type I error 
rate a. > 0 and then calculating a test statistic. The p-value — the 
probability of rejecting the least favorable null hypothesis (the simple 
hypothesis within the potentially composite null which is closest to 
the boundary with the alternative hypothesis) — is the probability of 
observing a result at least as extreme as the observed. In other words, 
thep-value is the cumulative distribution function of the test statistic 
evaluated at the observed test statistic with parameter given by the 
least favorable null distribution. We reject if thep-value is less than a. 
A test is consistent whenever its power (the probability of rejecting 
the null when it is indeed false) goes to unity as n — » oo. For any 
statistical test, if the p-value converges in distribution to <5 0 (point 
mass at zero), then whenever a > 0, power goes to unity. 

Based on the above considerations, we might consider the follow- 
ing hypothesis test: H 0 : L* > 0 and H A : L* = 0; rejecting the null 
indicates that F B. Unfortunately, the alternative hypothesis lies 
on the boundary, so the p-value is always equal to unity 7 . From this, 
Theorem 2 follows immediately: 

Theorem 2. There does not exist a viable test of M~ F B. 

In other words, we can never reject L* > 0 in favor of superve- 
nience, no matter how much data we obtain. 

Conditions for a consistent statistical test for £-supervenience. To 

proceed, therefore, we introduce a relaxed notion of supervenience: 

Definition 2. A4 is said to s-supervene on B for distribution F = 
F MB , denoted M~ F B, if and only ifL* < e for some 8 > 0. 

Given this relaxation, consider the problem of testing for E-super- 
venience: 



H'x : L*<£. 

Let h = n'Lp (g n ) be the test statistic. The distribution of h is avail- 
able under the least favorable null distribution. For the above 
hypothesis test, the p-value is therefore the binomial cumulative 
distribution function with parameter e; that is, p-value = 
U(n; n',e) = Binomial(fc; e), where [h] 0 = {0,1, . . . ,n}. 

We reject whenever this p-value is less than a; rejection implies that 
we are 100(1 - a)% confident that M~ F B. 

For the above £-supervenience statistical test, ifg„ — » g* as n — » oo, 
then L F (gn)^L* as n, n' — » oo. Thus, if L* < e, power goes to unity. 
The definition of £-supervenience therefore admits, for the first time 
to our knowledge, a viable statistical test of supervenience, given a 
specified e and a. Moreover, this test is consistent whenever g n con- 
verges to the Bayes classifier g*. 

The existence and construction of a consistent statistical test for 
£-supervenience. The above considerations indicate the existence of 
a consistent test for e-supervenience whenever the classifier used is 
consistent. To actually implement such a test, one must be able to (i) 
measure mind/brain pairs and (ii) have a consistent classifier g n . 
Unfortunately, we do not know how to measure the entirety of 
one's brain, much less one's mind. We therefore must restrict our 
interest to a mind/brain property pair. A mind (mental) property 
might be a person's intelligence, psychological state, current 
thought, gender identity, etc. A brain property might be the 
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number of cells in a person's brain at some time r, or the collection of 
spike trains of all neurons in the brain during some time period t to f ' . 
Regardless of the details of the specifications of the mental property 
and the brain property, given such specifications, one can assume a 
model, T. We desire a classifier^ that is guaranteed to be consistent, 
no matter which of the possible distributions Fmb e 3~ is the true 
distribution. A classifier with such a property is called a universally 
consistent classifier. Below, under a very general mind-brain model 
T, we construct a universally consistent classifier. 

Gedankenexperiment 1. Let the physical property under con- 
sideration be brain connectivity structure, so b is a brain-graph 
("connectome") with vertices representing neurons (or collec- 
tions thereof) and edges representing synapses (or collections 
thereof). Further let B, the brain observation space, be the collec- 
tion of all graphs on a given finite number of vertices, and let M, 
the mental property observation space, be finite. Now, imagine 
collecting very large amounts of very accurate identically and 
independently sampled brain-graph data and associated mental 
property indicators from F MB . A k n -nearest neighbor classifier 
using a Frobenius norm is universally consistent (see Methods 
for details). The existence of a universally consistent classifier 
guarantees that eventually (in n, n') we will be able to conclude 
M~ pB for this mind-brain property pair, if indeed e-superveni- 
ence holds. This logic holds for directed graphs or multigraphs or 
hypergraphs with discrete edge weights and vertex attributes, as 
well as unlabeled graphs (see ref. 8 for details). Furthermore, the 
proof holds for other matrix norms (which might speed up con- 
vergence and hence reduce the required n), and the regression 
scenario where \M\ is infinite (again, see Methods for details). 

Thus, under the conditions stated in the above Gedanken- 
experiment, universal consistency yields: 

Theorem 3. M~ pB^/3-yl as n, n'^ oo. 

Unfortunately, the rate of convergence of L P (g„) to Lp(g*) depends 
on the (unknown) distribution F = Fmb 9 - Furthermore, arbitrarily 
slow convergence theorems regarding the rate of convergence of 
L P (g n ) to L F (g*) demonstrate that there is no universal n, n' which 
will guarantee that the test has power greater than any specified target 
[3 > a 10 . For this reason, the test outlined above can provide only a 
one-sided conclusion: if we reject we can be 100(1 — ot)% confident 
that M~ pB holds, but we can never be confident in its negation; 
rather, it may be the case that the evidence in favor of Ai~ P B is 
insufficient because we simply have not yet collected enough data. 
This leads immediately to the following theorem: 

Theorem 4. For any target power fS min > a, there is no universal n, 
n' that guarantees /J > /?„,;„■ 

Therefore, even e-supervenience does not satisfy Popper's falsifia- 
bility criterion 11 . 

The feasibility of a consistent statistical test for £-supervenience. 

Theorem 3 demonstrates the availability of a consistent test under 
certain restrictions. Theorem 4, however, demonstrates that con- 
vergence rates might be unbearably slow. We therefore provide an 
illustrative example of the feasibility of such a test on synthetic data. 

Caenorhabditis elegans is a species whose nervous system is 
believed to consist of the same 302 labeled neurons for each organ- 
ism 12 . Moreover, these animals exhibit a rich behavioral repertoire 
that seemingly depends on circuit properties 13 . These findings motiv- 
ate the use of C. elegans for a synthetic data analysis 14 . Conducting 
such an experiment requires specifying a joint distribution F MB over 
brain-graphs and behaviors. The joint distribution decomposes into 
the product of a class-conditional distribution (likelihood) and a 
prior, F MB = F B \ M F M . The prior specifies the probability of any 
particular organism exhibiting the behavior. The class-conditional 



distribution specifies the brain-graph distribution given that the 
organism does (or does not) exhibit the behavior. 

Let A uv be the number of chemical synapses between neuron u and 
neuron v according to 15 . Then, let S be the set of edges deemed 
responsible for odor-evoked behavior according to 16 . If odor-evoked 
behavior is supervenient on this signal subgraph S, then the distri- 
bution of edges in S must differ between the two classes of odor 
evoked behavior 17 . Let E uv ii denote the expected number of edges 
from vertex v to vertex u in class j. For class m 0 , let E uv \ 0 = A uv + 
Y\, where r\ = 0.05 is a small noise parameter (it is believed that the 
C. elegans connectome is similar across organisms 12 ). For class m lt let 
Euv\i = A uv + z uv , where the signal parameter z uv = n for all edges 
not in cS, and z uv is uniformly sampled from [ — 5, 5] for all edges 
within S. For both classes, let each edge be Poisson distributed, 
FA uv \M=m, =Poisson(£„ v |y). 

We consider A:„-nearest neighbor classification of labeled multi- 
graphs (directed, with loops) on the 279 under Frobenius norm (the 
C. elegans somatic nervous system has only 279 neurons that make 
synapses with other neurons). The fc„-nearest neighbor classifier 
used here satisfies k„ — * °° as n — > oo and kjn — > 0 as n — » oo, ensuring 
universal consistency. (Better classifiers can be constructed for the 
joint distribution F MB used here; however, we demand universal 
consistency.) Figure 1 shows that for this simulation, rejecting 
(e = 0.1)-supervenience at a = 0.01 requires only a few hundred 
training samples. 

Importantly, conducting this experiment in actu is not beyond 
current technological limitations. 3D superresolution imaging 18 
combined with neurite tracing algorithms 19, 20, 21 allow the collection 
of a C. elegans brain-graph within a day. Genetic manipulations, laser 
ablations, and training paradigms can each be used to obtain a non- 
wild type population for use as M = m/" 1 , and the class of each 
organism (m 0 vs. ui]) can also be determined automatically 22 . 
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Figure 1 | C. elegans graph classification simulation results. The 

estimated hold-out misclassification rate L F (g n ) (with n' = 1000 testing 
samples) is plotted as a function of class-conditional training sample size 
rij = nil, suggesting that for e = 0.1 we can determine that M~ fB holds 
with 99% confidence with just a few hundred training samples generated 
from F MB . Each dot depicts L F (g n ) for some n; standard errors are 
(pP (g„)(l — Lp(gn)) l n ') 1 ^ ■ F° r example, at rij = 180 we have 
k n = L\/8«J = 53 (where |_'J indicates the floor operator), L F (g„) =0.057, 
and standard error less than 0.01. We reject H° 1 : I* > 0.1 at a = 0.01. 
Note that L* ~ 0 for this simulation. 
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Discussion 

This work makes the following contributions. First, we define stati- 
stical supervenience based on Davidson's canonical statement 
(Definition 1). This definition makes it apparent that supervenience 
implies the possibility of perfect classification (Theorem 1). We then 
prove that there is no viable test against supervenience, so one can 
never reject a null hypothesis in favor of supervenience, regardless of 
the amount of data (Theorem 2). This motivates the introduction of a 
relaxed notion called e-supervenience (Definition 2), against which 
consistent statistical tests are readily available. Under a very general 
brain-graph/mental property model (Gedankenexperiment 1), a con- 
sistent statistical test against £-supervenience is always available, no 
matter the true distribution F MB (Theorem 3). In other words, the 
proposed test is guaranteed to reject the null whenever the null is 
false, given sufficient data, for any possible distribution governing 
mental property/brain property pairs. 

Alas, arbitrary slow convergence theorems demonstrate that 
there is no universal n, n' for which convergence is guaranteed 
(Theorem 4). Thus, a failure to reject is ambiguous: even if the data 
satisfy the above assumptions, the failure to reject may be due to 
either (i) an insufficient amount of data or (ii) A4 may not be 
e-supervenient on B. Moreover, the data will not, in general, satisfy 
the above assumptions. In addition to dependence (because each 
human does not exist in a vacuum), the mental property measure- 
ments will often be "noisy" (for example, accurately diagnosing psy- 
chiatric disorders is a sticky wicket 23 ). Nonetheless, synthetic data 
analysis suggests that under somewhat realistic assumptions, con- 
vergence obtains with an amount of data one might conceivably 
collect (Figure 1 and ensuing discussion). 

Thus, given measurements of mental and brain properties that we 
believe reflect the properties of interest, and given a sufficient 
amount of data satisfying the independent and identically sampled 
assumption, a rejection of H„ : I* s e in favor of M& pB entails 
that we are 100(1 — a)% confident that the mental property under 
investigation is e-supervenient on the brain property under investi- 
gation. Unfortunately, failure to reject is more ambiguous. 

Interestingly, much of contemporary research in neuroscience and 
cognitive science could be cast as mind-brain supervenience investi- 
gations. Specifically, searches for "engrams" of memory traces 24 or 
"neural correlates" of various behaviors or mental properties (for 
example, consciousness 25 ), may be more aptly called searches for 
the "neural supervenia" of such properties. Letting the brain pro- 
perty be a brain-graph is perhaps especially pertinent in light of the 
advent of "connectomics" 26, 27 , a field devoted to estimating whole 
organism brain-graphs and relating them to function. Testing super- 
venience of various mental properties on these brain-graphs will 
perhaps therefore become increasingly compelling; the framework 
developed herein could be fundamental to these investigations. For 
example, questions about whether connectivity structure alone is 
sufficient to explain a particular mental property is one possible 
mind-brain e-supervenience investigation. The above synthetic data 
analysis demonstrates the feasibility of e-supervenience on small 
brain-graphs. Note that e-supervenience tests need not investigate 
seemingly intractable problems, like consciousness. For example, 
aspects of visual perception appear to supervene on visual cortical 
activity (for example, binocular rivalry 28 ). Moreover, an inability to 
reject e-supervenience for small e is also potentially meaningful. For 
example, perhaps auditory localization precision supervenes on a 
rate code only to some e > c, the rest supervening on a spike timing 
code 29 . Similar supervenience tests on increasingly complex mental 
properties will potentially benefit from either higher-throughput 
imaging modalities 30, 31 , more coarse brain-graphs 32, 33 , or both. 

Methods 

The 1-nearest neighbor (1-NN) classifier works as follows. Compute the distance 
between the test brain b and all n training brains, dj — d(b, bj) for all i e [n], where 
[n] = 1,2,..., n. Then, sort these distances, d^ < d^ 2 ) < - < d^, and consider their 



corresponding minds, mci), m (2 ),..., m (M ), where parenthetical indices indicate rank 
order among The 1-NN algorithm predicts that the unobserved mind is of 

the same class as the closest brain's class: m — m^y The k n nearest neighbor is a 
straightforward generalization of this approach. It says that the test mind is in the 
same class as whichever class is the plurality class among the k n nearest neighbors, 
m — argmax m , D j Ym"= i m {t) =m'\. Given a particular choice of k n (the number of 
nearest neighbors to consider) and a choice of d(v) (the distance metric used to 
compare the test datum and training data), one has a relatively simple and intuitive 
algorithm. 

Let g n be the k n nearest neighbor (fc H NN) classifier when there are n training 
samples. A collection of such classifiers {g n }, with k n increasing with n, is called a 
classifier sequence. A universally consistent classifier sequence is any classifier 
sequence that is guaranteed to converge to the Bayes optimal classifier regardless of 
the true distribution from which the data were sampled; that is, a universally con- 
sistent classifier sequence satisfies L F (g„) —* L F (g*) asH->=° for all F MB . In the main 
text, we refer to the whole sequence as a classifier. 

The fc M NN classifier is consistent if (i) k n — > ^ as n — » =° and (ii) kjn — » 0 as n —> cc 34 . 
In Stone's original prooP 4 , b was assumed to be a q-dimensional vector, and the L 2 

norm (^(f>,fr') — Y^j= i f — ^ J » wnere j indexes elements of the g-dimensional 
vector) was shown to satisfy the constraints on a distance metric for this collection of 
classifiers to be universally consistent. Later, others extended these results to apply to 
any L p norm 9 . When brain-graphs are represented by their adjacency matrices, one 
can stack the columns of the adjacency matrices, effectively embedding graphs into a 
vector space, in which case Stone's theorem applies. Stone's original proof also applied 
to the scenario when \ A4 | was infinite, resulting in a universally consistent regression 
algorithm as well. 

Note that the above extension of Stone's original theorem to the graph domain 
implicitly assumed that vertices were labeled, such that elements of the adjacency 
matrices could easily be compared across graphs. In theory, when vertices are 
unlabeled, one could first map each graph to a quotient space invariant to iso- 
morphisms, and then proceed as before. Unfortunately, there is no known poly- 
nomial time complexity algorithm for graph isomorphism 35 , so in practice, dealing 
with unlabeled vertices will likely be computationally challenging 8 . 
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